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Abstract 

Given a large set of measurement sensor data, in order to identify a simple function that 
captures the essence of the data gathered by the sensors, we suggest representing the data by 
(spatial) functions, in particular by polynomials. Given a (sampled) set of values, we interpolate 
the datapoints to define a polynomial that would represent the data. The interpolation is 
challenging, since in practice the data can be noisy and even Byzantine, where the Byzantine 
data represents an adversarial value that is not limited to being close to the correct measured 
data. We present two solutions, one that extends the Welch-Berlekamp technique in the case 
of multidimensional data, and copes with discrete noise and Byzantine data, and the other 
based on Arora and Khot techniques, extending them in the case of multidimensional noisy and 
Byzantine data. 
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1 Introduction 



Consider the task of representing information in an error-tolerant way, such that it can be introduced 
even if it contains noise or even if the data is partially corrupted and destroyed. Polynomials are a 
common venue for such approximation, where the goal is to find a polynomial p of degree at most 
d that would represent the entire data correctly. 

Our motivation comes from sensor data aggregation, and the need to extend the distributed 
aggregation to distributed interpolation, use sampling to cope with huge data and anticipate the 
value of missing data. For example, a sensor network may interact with the physical environment, 
while each node in the network is may sense the surrounding environment (e.g., temperature, 
humidity etc). The environmental measured values should be transmitted to a remote repository 
or remote server. Note that the environmental values usually contain noise, and there can be 
malicious inputs, i.e., part of the data may be corrupted. 

In contrast to distributed data aggregation where the resulting computation is a function such 
as COUNT, SUM and AVERAGE (e.g. [9, 13, 16]), in distributed data interpolation, our goal 
is to represent every value of the data by a single (abstracting) function. Our computational 
model consists of sampling the sensor network data and estimating the missing information using 
polynomial manipulations. 

The management of big data systems also gives motivation for the distributed interpolation 
method. The abstraction of big data becomes one of the most important tasks in the presence of 
the enormous amount of data produced these days. Communicating and analyzing the entire data 
does not scale, even when data aggregation techniques are used. This study suggests a method to 
represent the distributing big data by a simple abstract function (such as polynomial) which will 
lead to effective use of that data. 

We suggest interpolating the big data in the scope of distributed systems by using local data 
centers. Each data center samples the data around it and computes a polynomial that reflects the 
local data. The local polynomials are merged to a global one by interpolation in a hierarchical 
manner. In the process of calculating the local polynomials noise and Byzantine data samples are 
eliminated. 

Basic Definitions. 

• For multivariate polynomial p(x) G R[x] = Xk] let WpW^ = sup{\p(xi, Xk)\ ■ x\, Xk £ R}. 

• A monomial in a collection of variable Xi, ...,x n is a product 

•^l x 2 x n 

where ccj are non- negative integers. 

• The total degree of a multivariate polynomial p is the maximum degree of any term in p, 
where the degree of particular term is the sum of the variable exponents. 

• A polynomial q is a 5 -approximation to p if \\p — qW^ < 5. 

Polynomial Fitting to Noisy and Byzantine Data. 

Formally, in this paper, we learn the following problem: 
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Definition 1.1 (Polynomial Fitting to Noisy and Byzantine Data Problem). Given a sample Sofk 
dimension datapoints {(x^, ■■■,Xk i )}fLi an< ^ & function f defined on those points f{x\ i} = y%, 

a noise parameter 5 > and a threshold p > 0, we have to find a polynomial p of total degree d 
satisfying 

p(x\, Xk) £ [y — 5, y + S] for at least p fraction of S (1) 

Generally, we propose the use of polynomials to represent large amounts of sensor data. The 
process works by sampling the data and then using this sample to construct a polynomial whose 
distance (according to the metric) from the polynomial constructed using the whole data set 
is small. The main challenges to this approach are (i) the presence of noise (identified by the 
5 parameter), and (ii) arbitrarily corrupted data (Byzantine data, denoted by p) that can cause 
inaccurate sampling and, thus, lead to badly constructed polynomials. 

Given that the function / is continuous, by the Weierstrass approximation Theorem [4] we know 
that for any given e > 0, there exists a polynomial p' such that 

\\f-p'\L<* (2) 

This can tell us that our desired polynomial p exists (i.e., p' = p and e = 5, satisfying eq.l), and we 
can relate the data as arising from polynomial function (i.e., the unknown function / is d degree 
polynomial we need to reconstruct), and this is the underlying model assumed in the paper. 

One obvious candidate to construct approximating polynomial is interpolation at equidistant 
points. However, the sequence of interpolation polynomials does not converge uniformly to / for 
all / 6 C[0, 1] due to Runge's phenomenon [7]. Chebyshev interpolation (i.e., interpolate / using 
the points defined by the Chebyshev polynomial) minimizes Runge's oscillation, but it is not suffice 
the polynomial fitting problem presented above (Definition 1.1) due to the randomly distributed 
data we have assumed. 

Taylor polynomials are also not appropriate; for even setting aside questions of convergence, 
they are applicable only to functions that are infinitely differentiable, and not to all continuous 
functions. 

Another classical polynomial sequence is suggested by S. Bernstein [31 as constructive proof of 

, .^fr©©^-— 

uniformly to any continuous function / which is bounded on [0,1]. The formal Berenstein poly- 
nomial samples the function / in an equidistant fashion. To handle a random sample data, we 
can use Vitale [21] results which consider that the datapoints S = x±, x^ are i.i.d observations 
drawn from an unknown density function /. The Bernstein polynomial estimate of f defined as 

Bn{x) = ^j^- ^2 vfn ( ■ ) X% Q ~ x ) n ~ % where pf n is the number of points (xj's) appear in the interval 



[— 

Ln+1 ' n+1 



Vitale [21] showed that B f n (x) - f 



< e for every given e > 0. 

Tenbusch [20] extended Vitale's idea to multidimensional densities, where there is need to note 
that those works hold only when the datapoints are i.i.d observations. Another reason not to use 
the Bernstien polynomial is the slow convergence rate (Voronovskaya's Theorem states that for 
functions that are twice differentiable, the rate of convergence is about 1/n, see Davis [7]). 

Considering other classical curve-fitting and approximation theories [17], most research has used 
the £2 norm of noise, such as the method of least square errors. These attitudes not suffice the 
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adversarial noise we have assumed here. To our knowledge, only [2] referred the noise that fits 
our considered problem and we further relate [2] study. 

The polynomial fitting problem as stated in Definition 1.1 can also be studied by Error- 
Correcting Code Theory. From that point of view, extensive literature exists dealing with the 
noise- free case (i.e., 5 = and p < 1). In the next section, we present an algorithm that handles 
a combination of discrete noise and Byzantine data based on the Welch-Berlekamp [22] error- 
elimination method. Moreover, the fundamental Welch-Berlekamp algorithm treats only the one- 
dimension case, where we suggest a means to deal with corrupted-noisy data appearing at one and 
multi-dimensional inputs. 

Related to unrestricted noise, we refer to the polynomial-fitting problem as defined by Arora 
and Khot [2]. Based on their results in Section 3, we introduce the polynomial fit generalization, 
where we provide a polynomial-time algorithm dealing with multi-variate data. 

Summarizing, this work provides the following contributions: 

• We describe an algorithm that constructs a polynomial using the Welch-Berlekamp (WB) 
method as a subroutine. The algorithm is tolerated to discrete-noise and Byzantine data. 

• We identify how the previous method can be generalized to handle multi-dimensional data. 
Moreover, we present a multivariate analogue of the WB method, under conditions which will 
be specified. 

• Using linear programing minimization and the Markov-Bernstein Theorem, we generalized 
Arora and Khot algorithm to reconstruct an unknown -multi- dimensional polynomial. Fur- 
thermore, we detail the way to eliminate the Byzantine appearance when such inputs exist. 

Those three points stated in the three algorithms presented in the paper. The first Algorithm 
handles one-dimensional Byzantine data that contains discrete-noise. Algorithm 2 generalized the 
WB idea to deal with multivariate malicious data. Finally, Algorithm 3 summarized our approach 
to cope with unrestricted noise appeared in the (partially corrupted) data. 

2 Discrete Finite Noise 

In this section, we will study a simple aspect of the polynomial fitting problem posed in Defini- 
tion 1.1, where the data function is a polynomial, and we relaxed the noise constraint to be finite 
and discrete, i.e., the noise 5 is defined on a finite field ¥ q containing q elements. 

Welch and Berlekamp related the problem of polynomial reconstruction in their decoding algo- 
rithm for Reed Solomon codes [22]. The main idea of their algorithm is to describe the received 
(partially corrupted) data as a ratio of polynomials. Their solution holds for noise-free cases and 
a limited fraction of the corrupted data (<5 = 0, p > 1/2). Almost 30 years later, Sudan's list 
decoding algorithm [19] relaxed the Byzantine constraint (5 = Q,p can be less than 1/2) by using 
bivariate polynomial interpolation. Those concepts do not hold up well in the noisy case since 
they use the roots of the polynomial and the divisibility of one polynomial by other methods that 
are problematic for noisy data (as shown in [2], Section 1.2). Here, we will use the WB algo- 
rithm [22] as a "black box" to obtain an algorithm that handles the discrete-noise notation of the 
polynomial-fitting problem. 

Given a data set {(x«, yi)}f = i that is within a distance of t = pN from some polynomial p(x) of 
degree < d, the WB approach to eliminate the irrelevant data is to use the roots of an object called 
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the error-locating polynomial, e. In other words, we want e(xj) = whenever p(xi) 7^ yj. This is 
done by defining another polynomial q{x) = p(x)e(x). To resolve these polynomials we need to 
solve the linear system, q(xi) = yie{xi) for all i. 

Welch and Berlekamp show that e(x)\q(x) and p(x) can be found by the ratio p(x) = q(x)/e(x) 
at 0(N U ') running time (where oj is the matrix multiplication complexity). In Algorithm 1, we are 
use the WB method as a subroutine to manage the noisy-corrupted data. 

Algorithm 1 Reconstruct the polynomial p(x) representing the true data 
Require: S, S' C S, p, d,5,A = v\, v\$>\ 

i 

repeat 

i <r- i + 1 

Pi {x)^WB{Sld,p) 
until Pi(xj) £ [yj — 6, yj + 5] for at least p fraction of j's; (xj,yj) 6 S — 
return p{x) <r- pi(x) 



Given any sample S such that p fraction of S is not corrupted, we will choose a subset S' C S 
in a size related to the desired degree d and p (the WB algorithm requires It + d points, where 
t = pN is the number of the corrupted points). At every step i, we will add S' different values of 
noise as defined by the set A which contain all the vectors of length assigned the elements of ¥ q 
in lexicographic order, i.e., A = {(a±, ...,015/1) : a« G F^}. Now, we can reconstruct the polynomial 
Pi using the WB algorithm. The resulting polynomial pi is tested by the original dataset S, where 
the criteria is that pi is within S from all nodes but the Byzantine nodes (according to the maximal 
number of Byzantine as defined by p) . 

Since we assume a discrete finite noise (5 £ ¥ q ), for each datapoint at the subset S' (of size 
0(d + p\S\)), there is a possibly of q values (where q is a constant). Thus, in the worst case, when 
we run the WB polynomial algorithm for every possible value, it will cost poly(d + N) time. 

Note that if the desired polynomial's degree d is not given, we can search for the minimal degree 
of a polynomial that fits the 5 and number of Byzantine node restrictions in a binary search fashion. 

Multidimensional Data. 

To generalize the former algorithm to handle multidimensional data, there is need to formalize the 
WB algorithm to deal with multivariate polynomials. This is a challenging task due to the infinite 
roots those polynomials may have (and as previously mentioned, the WB method is strictly based 
on the polynomials' roots). 

A suggested method to handle 3-dimensional data is to assume that the values of datapoints 
in one direction (e.g., x-direction) are distinct. This can be achieved by assuming the inputs S = 
(xi, yi, f(x±, yi))..., (xjv, Vn, f(xN, Vn)) are i.i.d observations. Moreover, we allow the malicious 
authority to change the observation input but not its distribution (i.e., to determine Z{ = f(xi,yi) 
value only). This assumption forces the data to have different Xj's values, which help us to define 
the error locating polynomial e in the x-direction only (or symmetrically over the y-axis). 

The 3-dimensional polynomial reconstruction is described in Algorithm 2. 
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Algorithm 2 Reconstruct the polynomial p(x, y) representing the true data 



• Input: < t = pN which is the Byzantine appearance bound, the total degree d > 1 of the 
goal polynomial and N triples (xi,yi, Zi)^ =1 with distinct Xj's . 

• Output: Polynomial p(x, y) of total degree at most d or fail. 

• Step 1: Compute a non-zero univariate polynomial e(x) of degree exactly t and a bivariate 
polynomial q(x, y) of total degree d + t such that: 

Zie(xi) = q(xi,yi) l<i<N (3) 

If such polynomials do not exist, output fail. 

q(xj y) 

• Step 2: If e does not divide q, output fail, else compute p(x, y) = — t^t— ■ If A(zi,p(xi, yi)i) > 

e(x) 

t, output fail, else output p(x,y). 



Theorem 2.1. Let p be an unknown d total degree polynomial with two variables. Given a threshold 
p > and a sample S of N = (^J^™) +t (t = pN ) random points (x«, yi, Zi)^ =1 such that 

Zi = p(xi, yi) for at least p fraction of S. 

The algorithm above reconstructs p at 0(N U1 ) running time (where oj is the matrix multiplication 
complexity) . 

Proof. The proof of the Theorem above follows from the subsequent claims. 

Claim 2.2 (Correctness). There exist a pair of polynomials e(x) and q(x,y) that satisfy Step 1 
such that q(x, y) = p(x, y)e{x) . 

Proof. Taking the error locator polynomial e(x) and q(x, y) = p(x, y)e(x), where deg(q) < deg(p) + 
deg(e) <t + d. By definition, e(x) is a degree t polynomial with the following property: 

e(x) = iff Zi ^ p(x,y) 

We now argue that e(x) and q(x, y) satisfy eq. 3. Note that if e(xj) = 0, then q(xi, yi) = Zie(xi) = 0. 
When e(xj) / 0, we know p(xi, yi) = Zi and so we still have p(xi, yi)e{xi) = Zie(xi), as desired. □ 

Claim 2.3 (Uniqueness). If any two distinct solutions (qi(x,y),ei(x)) / (q2(x, y), &i (x)) satisfy 

Step 1, then they will satisfy — ' ^} = ^ ' ^} . 

ei(x) e 2 (x) 

Proof. It suffices us to prove that gi(x, y)e2(x) = (q2(x, y)e\(x). Multiply this with zi and substitute 
x,y with Xi,yi, respectively, 

qi(xi,yi)e 2 (xi)zi = q2{xi,yi)e 1 (x i )z i 

We know, Vi G [N] qi(xi,yi) = e\{xi)Zi and q 2 (xi,yi) = e2(xi)zi If Zi = 0, then we are done. 
Otherwise, if Zi / 0, then qi(xi, yi) = 0,q^Xi,yi) = =>- q±(x,y)e2(x) = (q2(x,y)ei(x) as desired. □ 
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Claim 2.4 (Time complexity). Given N = t+ ( d ^^ 2 ) data samples, we can reconstruct p(x,y) 
using 0(N UJ ) running time. 

Proof. Generally, for m variate polynomial with degree d, there are ( d "^ m ) terms [18]; thus, it is a 
necessary condition that we have t + ( d ^tJ 2 ) distinct points for q and e to be uniquely defined. We 
have N linear equation in at most N variables, which we can solve e.g., by Gaussian elimination 
in time 0(N U} ) (where u is the matrix multiplication complexity). 

Finally, Step 2 can be implemented in time O(NlogN) by long division [1]. Note that the 
general problem of deciding whether one multivariate polynomial divides another is related to 
computational algebraic geometry (specifically, this can be done using the Grobner base). However, 
since the divider is a univariate polynomial, we can mimic long division, where we consider x to be 
the "variable" and y to just be some "number." 

□ 

Example 1. Suppose the unknown polynomial is p(x,y) = x + y. Given the parameters: d = 1 
(degree of p), m = 2 (number of variable at p) and t = 1 (number of corrupted inputs) and the set 
of t + = 7 points: 

(1,2,2),(2,2,4),(6,1,7),(4,3,7), (8,2,0), (9,1,10), (3,7,10) 

that lie on z = p(x, y). Following the algorithm, we define: deg(e) = 1, deg(q) = 2 and 

qi = a.\x\ + a 2 Xiyi + a 3 yf + a 4 Xj + a 5 yi + a 6 = Zi(xi + a 7 ) 

for coefficients a±, ...,ae,/3 and 1 < i < 12. Note that we force e(x) not to be the zero polynomial 
by define it to be monic (i.e., the leading coefficient equals to 1). Thus, we derive the linear system: 

Oil + «2 + «3 + «4 + «5 + "6 = 2/3 + 2 
4a i + 4a 2 + 4a 3 + 2a 4 + 2a 5 + a 6 = 4/3 + 8 
36ai + 6«2 + «3 + 6a 4 + as + «6 = 7/3 + 42 
16ai + 12a 2 + 9a 3 + 4a 4 + 3a 5 + a 6 = 7/3 + 28 
64«i + 16a 2 + 4a3 + 8«4 + 2a5 + a§ = 
81ai + 9a 2 + a 3 + 9a 4 + a 5 + a 6 = 10/3 + 90 
9ai + 21a 2 + 49a 3 + 3a 4 + 7a 5 + a 6 = 10/3 + 30 

Solving the system gives: </(a;, y) = x 2 + xy — 8x — 8y and e(x) = x — 8. Dividing those polynomials, 
we get the expected solution: q(x, y)/e(x) = p(x) = x + y. 

Corollary 2.5 (Multivariate Polynomial Reconstruction). Let p be an unknown d total degree 
polynomial with m variable. Given a threshold p > 0, a noise parameter 5 and a sample S of N 
random points (x^, x mi , yi)^ =1 such that 

Vi G \pixii, ...,x mi ) - S,p(xi i ,...,x mi ) + 5] for at least p fraction of S 

p can be reconstructed using N = ( d+ J^f m ) + P™ datapoints. 
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Proof. Following Theorem 2.1, we can rewrite its proof for the multidimensional generalization. An 
interesting question is if there is any advantage to define the error- location polynomial e to be multi- 
variate (instead of univariate as we previously presented). One "advantage" may be decreasing the 
required number of datapoints or improvement of the complexity. The size of the input data is 
strictly defined by the given bound on the corrupted data (t = pN) and the goal polynomial degree 
(d). Thus, there is no sense if the unknown coefficient comes from the highly degree polynomial 
or from the highly dimension polynomial, i.e., both options require the same size of sample, as 
illustrated in the Appendix. 

Related to the complexity change, when the error-locating polynomial is multivariate, Step 2 
of Algorithm 2 is more challenging since it contains multivariate polynomial division. A related 
reference is [10] which is the most efficient implementation for the computation of Grobner bases 
relies on linear algebra. Using Grobner bases we can implement the division at close to O(NlogN) 
time, as done in Algorithm 2. 

To finish the proof, there is the need to explain how to deal with the noise. Since we assume 
only discrete noise, we can dismiss it using the method illustrated at Algorithm 1. We consistently 
insert a vector of possible noise and try to reconstruct the polynomial using Algorithm 2. □ 

3 Random Sample with Unrestricted Noise 

Motivated by applications in vision, Arora and Khot [2] studied the univariate polynomial fitting to 
noisy data using 0(d 2 ) datapoints, where d is the polynomial degree. In this part, we generalized 
their results to k- dimensional data. 

Since our motivation comes from sensor planar aggregation, we will focus on bivariate polyno- 
mial reconstruction, where the multivariate proof is symmetric. We assume by rescaling the data 
that each Xi, m, f(xi,yi) £ [—1, 1]. Allowing small noise at every point and large noise occasionally 
then there may be too many polynomials agreeing with the given data. Thus, given the noise 
parameter 5, our goal is to find a polynomial p that is a ^-approximation of /, i.e., p is 5-close in 
loo norm to the unknown polynomial. 

Let / be a set of d 5 equally spaced points that cover the interval [—1,1]. Given the random 
sample S C I, \ S\ = ^-log(^), we approach the reconstruction problem by defining a linear pro- 
gramming system with the fitting polynomial as its solution. To incorporate the constraint that 
the unknown polynomial must take values of [—1, 1], we move to Chebyshev's representation of the 
polynomial. Thus, each of its coefficients is at most a/2 (see eq. 5). We represent Chebyshev's 
polynomial by Tj (■),!}(■), and the variables Cij at the system are the Chebyshev coefficients. We 
construct the LP: 



minimize 5 



s.t. 



n m 



f(xk,Vk) ~ S < ^^CijT^x^Tjiyk) < f(x k ,y k ) + 5, k = 1, \S\ 



(4) 



» 3 



(Hj\ < a/2, i = l, ...,n;j 



m 



(5) 
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n m 



^^CijTiWTjMlKl, V XtV €l (6) 

» 3 

The following Theorem presents our main result for solving the polynomial fitting problem: 

Theorem 3.1. Let f be an unknown d total degree polynomial with two variables, such that 
f(x,y) £ [—1, 1] when x,y G [—1, 1]. Given a noise parameter 5 > 0, a threshold p > 0, a constant 
c > (dependent on the dimension of the data) and a sample S of 0(^-log(^)) random points 
Xi, Vi, Zi £ [—1, 1] such that Z{ £ [f(xi,yi) — 5, f(xi, yi) + 5] for at least p fraction of S. With proba- 
bility at least | (over the choice of S), any feasible solution p to the above LP is cS- approximation 
off- 

Proof. For our proof, we need Bernstein-Markov inequality which we state below. 

Theorem 3.2. (Bernstein-Markov [8]) For a polynomial Pd of total degree d, a direction £ and a 
bounded convex set A C R k 



< c A d 2 IIP.IL (7) 



where c A is independent of d (and dependent on the geometric structure of A). 

Let p = p(x, y) be the d = n + m total degree polynomial obtained from taking any solution to 
the above LP. We know p exists, i.e., the LP is feasible because the coefficient of / satisfies it. 
Note that although the LP constraint that 

f-5< P <f+6 

it is NOT immediate that 

Il/-P|loo<* 

(i.e., p is the ^-approximation of /) since eq. 4 stands for the discrete sample of S, where here we 
need to prove the 8- approximation in the continuous state. 

Claim 3.3. < 1 + 0( " 3 ™+™ 3 " ), 

Since HT^x)^ = HT^y)^ = 1, from Bernstein-Markov (Theorem 3.2), we get \T?(x)\ = 0(i 2 ). 
Thus: 



n m / 

Wx\<^'EK\(Ti(x)T j (y)) x 

i 3 
n m 



* 3 

(8) 



From symmetric consideration, we get: 
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\p' y \ < 0(nm 3 ) 

(9) 

By construction, p takes all values in [—1, 1] for all points in /, and the distance (in x direction 
or y direction) between successive points of / is 2/\I\ (I is equidistant). The claim follows from 
the fact that the derivative p' by definition gives the rate of change in p: WpW^ is between two 
successive points of / that their values = 1 where the possible change at the interval of length 2/\I\ 
is p' = 0{r?m + m 3 n). 

Claim 3.4. Wp'J^ , Hj/J^ < 0((n + mf). 

This follows from Bernstein-Markov (Theorem 3.2) and the estimate WpW^ = 1 + 0(1)- 
Let e denote the largest distance between two successive points out of (x±, yi), (x\$\, U\s\)- Ev- 
ery interval of size e contains at least one of the datapoints (forming e-net). With high probability, 
e = 0(log\S\/\S\) = Now, p and / are functions satisfying Wp'J^ , , Hj/J^ , WftW^ < 

0((n+m) 2 ); hence, \\(p — /^H^ , \\(p — f)' y \\ < 0((n+m) 2 ). Due to the LP constraint, p, f differs 
by at most 5 on the points in the e-net, so we get 



||(p-/)|| 0O <2 ( 5 + 0(e(n + m) 2 ) = C <5 (10) 
which is the finished proof of Theorem 3.1; □ 
Remarks: 

• If we know that the derivative is bounded by A (i.e., f' x , f' y < A), the above proof that gives 
us j-log^ points is sufficient. 

• The Bernstein-Markov Theorem 3.2 also holds for multivariate trigonometric polynomials 
(see [8]), thus, we can generalize the above proof also for this class of function. This gener- 
alization is important in the scope of wireless sensor networks since the use of trigonometric 
function is the appropriate way to represent the sensor data behavior (e.g., temperature). 

• The presented method holds only when we assume equidistance or random sampling (as 
opposed to Section 2 that handles any given sample). Otherwise, when the dataset is dense, 
since we allow 8 perturbation of the data, it can cause a sharp slope in the resulting function 
although the original data is close to the constant at the sampling interval. 

Corollary 3.5. Given the set S of 0(^-log(^)) k- dimensional random datapoints and a constant 
c(S, k) dependent only on the geometry and the dimension of the data, we can reconstruct the 
unknown polynomial within c(S, k)5 error in £cq norm with high probability over the choice of the 
sample. 

Proof. The two-dimensional proof holds for the general dimension, where the approximation ac- 
curacy dependent on the constant c(S, k) comes from Theorem 3.2. This constant is independent 
of the polynomial degree, but dependent on the set of the data points (see [8]). Note that c(S,k) 
increases exponentially when increasing the dimension. □ 
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Byzantine Elimination. 

Arora and Khot [2] do not deal with Byzantine inputs; however, the method they presented in 
Section 6 can be rewritten to eliminate corrupted data such that the input datapoints will contain 
only true (but noisy) values. 

Assume that p fraction of the data is uncorrupted. For any point Xi,yi G [—1,1], consider a 
small square- interval A = [xi — Jf, x« + Jj] x [y« — Jr, y« + Jr] (where d is the total degree of the 
polynomial we need to find). For a sample of d A l ° 9 points, with high probability Q(log(d)) of the 
samples lie in this square. We are given that p fraction of these sample points gives an approximate 
value of f(xi,yi), i.e., the correct value lies in the interval [f(xi,yi) — S,f(xi,yi) + S] and the rest 
of the sample is corrupted and, thus, is NOT in [f(xi, yi) — S, f(xi, yi) + 5]. As shown in Claim 3.4, 
the derivatives are bounded by 0(d 2 ); thus, the value of the polynomial is essentially constant over 
A. Hence, at least p fraction of the values seen in this square will lie in [f(xi,yi) — 5, f(xi,yi) + 8] 
and the rest is irrelevant corrupted data. Thus, at every point (xj,yj), we can reconstruct f(xi,yi). 
The sample is large enough so that we can reconstruct the values of the polynomial at say, d 2 /8 
equally spaced points. Now, applying the techniques presented in Section 3 enables us to recover 
the polynomial. 

Reconstructing the Multivariate Polynomial. 

To conclude this section, we summarize the presented results in Algorithm 3: The algorithm requires 

Algorithm 3 Reconstruct the polynomial p(x, y) representing the true data 
Require: S, p, d, 5 

S' ^0 

% <- 1 

repeat 

A=[Xi-j S ,X i +j s ] X [y i -S s ,y i +S s] 

Zl+ ~ k +Zk , Zj : (xj,yi) G A 
S' <- {(xj,yj,Zj)\(xj,yj) G A A zj w c} 

until \S'\ > f 

p(x, y) -(—LP minimization (Equations 4-6) on the set S' 
return p(x,y) 



the dataset S, the true-data fraction p, the total degree of the expected polynomial d and the noise 
parameter 5. In the first phase, we eliminate the Byzantine occurrence, as described in the former 
subsection. Assuming the given data lie in [—1,1] x [—1,1] (or translate to that interval), for the 
points in S, we are looking at the ^--close interval and choose all the points that have constant 
value at this interval (this is done by the average operation) . We repeat this process until we collect 
enough true-datapoints, i.e., at least ^ points. This set (sign as S' in the algorithm) is the input 
for the linear-programming equations which finally give us the expected polynomial as proof at 
Theorem 3.1. 
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4 Conclusions 



We have presented the concept of data interpolation in the scope of sensor data aggregation and 
representation, as well as the new big data challenge, where abstraction of the data is essential in 
order to understand the semantics and usefulness of the data. Interestingly, we found that classical 
techniques used in numeric analysis and function approximation, such as the Welsh-Berlekamp 
efficient removal of corrupted data, Arora Khot and the like, relate to the data interpolation 
problem. Since the sensor aggregation task is usually a collection of inputs from spatial sensors, 
for the first time we have extended existing classical techniques for the case of three or even more 
function dimensions, finding polynomials that approximate the data in the presence of noise and 
limited portion of completely corrupted data. 

We believe that the mathematical techniques we have presented have applications beyond the 
scope of sensor data collection or big data, in addition to being an interesting problem that lies 
between the fields of error-correcting and the classical theory of approximation and curve fitting. 

Throughout the research we have distinguished two different measures for the polynomial fit- 
ting to the Byzantine noisy data problem: the first being the Welsh-Berlekamp generalization for 
discrete-noise multidimensional data and the second being the linear-programming evaluation for 
multivariate polynomials. 

Approached by the error-correcting code methods, we have suggested a way to represent a noisy- 
malicious input with a multivariate polynomial. This method assumes that the noise is discrete. 
When the noise is unrestricted, based on Bernstein-Markov Theorem and Arora & Khot algorithm, 
we have suggested a method to reconstruct algebraic or trigonometric polynomial that traverses p 
fraction of the the noisy multidimensional data. 

We suggest to use polynomial to represent the abstract data since polynomial is dense in the 
function space on bounded domains (i.e., they can approximate other functions arbitrarily well) and 
have a simple and compact representation as oppose to spline e.g., [11] or others image processing 
methods. 

Directions for further investigation might include the use of interval computation for represent- 
ing the noisy data with interval polynomials. 
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A Appendix 

Given that the goal unknown polynomial p has m = 2 variable, deg(p) = 1 and that the data contain 
t = 2 Byzantine appearance, we can define the error-correcting polynomial e to be univariate 
polynomial, deg(e) = 2 and get the linear equation: 

a\x 3 + a2X 2 y + a^xy 2 + a^y 3 + a^x 2 + a^xy + ajy 2 + agx + agy + aio = z(x 2 + /3ix + j3 2 ) (11) 

when substitute the given data 

(1,2,2),(-2,6,0;),(2,2,4),(6,1,7),(4,3,7),(9,1,10),(3,7,10),(5,7,12),(7,4,11),(10,3,13),(1^ 

at (eq.ll) we get: 

qi (x, y) = xy — 2y — 2x + x 2 y + x 2 + x 3 
e\(x) = x 2 + x — 2 

Or, by defining e to be bivariate polynomial,de^(e) = 1: 

qix 3 + a 2 x 2 y + a 3 xi/ 2 + a 4 y 3 + a 5 x 2 + a 6 xy + a 7 y 2 + a 8 £ + a 9 y + ai = .z(x + fiiy + /%) (12) 

which its solution is: 

q 2 (x, y) = x 2 + 7xy/4 - 5x/2 + 3y 2 /4 - 5y/2 
e 2 (x,y)=x + 3y/4-5/2 

At both cases the polynomial devision result is equals and gives the expected solution: 

Qi(x,y)/ei(x,y) = q 2 (x,y)/e 2 (x,y) = x + y = p(x). (13) 
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